Search CORE

88 research outputs found

PiSDF: Parameterized & Interfaced Synchronous Dataflow for MPSoCs Runtime Reconfiguration

Author: Desnos Karol
Heulot Julien
Publication venue: HAL CCSD
Publication date: 06/10/2014
Field of study

International audienceDataflow models of computation are widely used for the specification, analysis, and optimization of Digital Signal Processing (DSP) applications. In this talk, we present the Parameterized and Interfaced Synchronous Dataflow (πSDF) model that addresses the important challenge of managing dynamics in DSP-oriented representations. In addition to cap-turing application parallelism, which is an intrinsic feature of dataflow models, πSDF enables the specification of hierarchical and reconfigurable applications. The Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPIDER) is also presented to support the execution of πSDF specifications on heterogeneous Multiprocessor Systems-on-Chips (MPSoCs)

HAL-Rennes 1

Numerical Representation of Directed Acyclic Graphs for Efficient Dataflow Embedded Resource Allocation

Author: Arrestier Florian
Desnos Karol
Juarez Eduardo
Menard Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

International audienceStream processing applications running on Heterogeneous Multi-Processor Systems on Chips (HMPSoCs) require efficient resource allocation and management, both at compile-time and at runtime. To cope with modern adaptive applications whose behavior can not be exhaustively predicted at compile-time, runtime managers must be able to take resource allocation decisions on-the-fly, with a minimum overhead on application performance. Resource allocation algorithms often rely on an internal modeling of an application. Directed Acyclic Graph (DAGs) are the most commonly used models for capturing control and data dependencies between tasks. DAGs are notably often used as an intermediate representation for deploying applications modeled with a dataflow Model of Computation (MoC) on HMPSoCs. Building such intermediate representation at runtime for massively parallel applications is costly both in terms of computation and memory overhead. In this paper, an intermediate representation of DAGs for resource allocation is presented. This new representation shows improved performance for run-time analysis of dataflow graphs with less overhead in both computation time and memory footprint. The performances of the proposed representation are evaluated on a set of computer vision and machine learning applications

HAL-CentraleSupelec

HAL-Rennes 1

Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

Author: Aridhi Slaheddine
Desnos Karol
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

CiteSeerX

HAL-Rennes 1

Pre- and post-scheduling memory allocation strategies on MPSoCs

Author: Aridhi Slaheddine
Desnos Karol
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 31/05/2013
Field of study

6 pagesInternational audienceThis paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC). This method first consists of deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool. Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution

Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

Author: Aridhi Slaheddine
Desnos Karol
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 16/07/2012
Field of study

PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

Author: Aridhi Slaheddine
Desnos Karol
Guy Clément
Heulot Julien
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 11/09/2014
Field of study

International audienceThe high performance Digital Signal Processors (DSP) currently manufactured by Texas Instruments are heterogeneous multiprocessor architectures. Programming these architectures is a complex task often reserved to specialized engineers because the bottlenecks of both the algorithm and the architecture need to be deeply understood in order to obtain a fairly parallel execution. The PREESM framework objective is to simplify the programming of multicore DSP systems by building on dataflow programming methods. The current functionalities of this scalable framework cover memory and time analysis, as well as automatic deadlock-free code generation. Several tutorials are provided with the tool for fast initiation of C programmers to multicore DSP programming. This paper demonstrates PREESM capabilities by comparing simulation and execution performances on a stereo matching algorithm prototyped on the TMS320C6678 8-core DSP device

Crossref

HAL-Rennes 1

Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

Author: Bhattacharyya Shuvra,
Desnos Karol
Heulot Julien
Liu Yanzhou
Maggiani Luca
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 26/10/2016
Field of study

International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

Author: Bhattacharyya Shuvra,
Desnos Karol
Heulot Julien
Liu Yanzhou
Maggiani Luca
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 26/10/2016
Field of study

HAL-CentraleSupelec

Crossref

HAL Clermont Université

HAL-Rennes 1

Models of Architecture

Author: Bhattacharyya Shuvra S.
Desnos Karol
Heulot Julien
Liu Yanzhou
Maggiani Luca
Nezan Jean-François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 15/12/2015
Field of study

The current trend in high performance and embedded computing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In state of the art Model Driven Engineering (MDE) methods, different communities have developed custom architecture models associated to languages of substantial complexity. This fact contrasts with Models of Computation (MoCs) that provide abstract representations of an algorithm behavior as well as tool interoperability.In this report, we define the notion of Model of Architecture (MoA) and study the combination of a MoC and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. An MoA provides reproducible cost computation for evaluating the efficiency of a system. A new MoA called Linear System-Level Architecture Model (LSLA) is introduced and compared to state of the art models. LSLA aims at representing hardware efficiency with a linear model. The computed cost results from the mapping of an application, represented by a model conforming a MoC on an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

HAL-CentraleSupelec

HAL Clermont Université

HAL-Rennes 1

Etude mémoire et représentations flux de données pour le prototypage rapide d'applications de traitement du signal sur MPSoCs

Author: Desnos Karol
Publication venue: HAL CCSD
Publication date: 26/09/2014
Field of study

The development of embedded Digital Signal Processing (DSP) applications for Multiprocessor Systems-on-Chips (MPSoCs) is a complex task requiring the consideration of many constraints including real-time requirements, power consumption restrictions, and limited hardware resources. To satisfy these constraints, it is critical to understand the general characteristics of a given application: its behavior and its requirements in terms of MPSoC resources. In particular, the memory requirements of an application strongly impact the quality and performance of an embedded system, as the silicon area occupied by the memory can be as large as 80% of a chip and may be responsible for a major part of its power consumption. Despite the large overhead, limited memory resources remain an important constraint that considerably increases the development time of embedded systems. Dataflow Models of Computation (MoCs) are widely used for the specification, analysis, and optimization of DSP applications. The popularity of dataflow MoCs is due to their great analyzability and their natural expressivity of the parallelism of a DSP application. The abstraction of time in dataflow MoCs is particularly suitable for exploiting the parallelism offered by heterogeneous MPSoCs. In this thesis, we propose a complete method to study the important aspect of memory characteristic of a DSP application modeled with a dataflow graph. The proposed method spans the theoretical, architecture-independent memory characterization to the quasi-optimal static memory allocation of an application on a real shared-memory MPSoC. The proposed method, implemented as part of a rapid prototyping framework, is extensively tested on a set of state-of-the-art applications from the computer-vision, the telecommunication, and the multimedia domains. Then, because the dataflow MoC used in our method is unable to model applications with a dynamic behavior, we introduce a new dataflow meta-model to address the important challenge of managing dynamics in DSP-oriented representations. The new reconfigurable and composable dataflow meta-model strengthens the predictability, the conciseness and the readability of application descriptions.Le développement d’applications de traitement du signal pour des architectures multi-coeurs embarquées est une tâche complexe qui nécessite la prise en compte de nombreuses contraintes. Parmi ces contraintes figurent les contraintes temps réel, les limitations énergétiques, ou encore la quantité limitée des ressources matérielles disponibles. Pour satisfaire ces contraintes, une connaissance précise des caractéristiques des applications à implémenter est nécessaire. La caractérisation des besoins en mémoire d’une application est primordiale car cette propriété a un impact important sur la qualité et les performances finales du système développé. En effet, les composants de mémoire d’un système embarqué peuvent occuper jusqu’à 80% de la surface totale de silicium et être responsable d’une majeure partie de la consommation énergétique. Malgré cela, les limitations mémoires restent une contrainte forte augmentant considérablement les temps de développements. Les modèles de calcul de type flux de données sont couramment utilisés pour la spécification, l’analyse et l’optimisation d’applications de traitement du signal. La popularité de ces modèles est due à leur bonne analysabilité ainsi qu’à leur prédisposition à exprimer le parallélisme des applications. L’abstraction de toute notion de temps dans les diagrammes flux de données facilite l’exploitation du parallélisme offert par les architectures multi-coeurs hétérogènes. Dans cette thèse, nous présentons une méthode complète pour l’étude des caractéristiques mémoires d’applications de traitement du signal modélisées par des diagrammes flux de données. La méthode proposée couvre la caractérisation théorique d’applications, indépendamment des architectures ciblées, jusqu’à l’allocation quasi-optimale de ces applications en mémoire partagée d’architectures multi-coeurs embarquées. L’implémentation de cette méthode au sein d’un outil de prototypage rapide permet son évaluation sur des applications récentes de vision par ordinateur, de télécommunication, et de multimédia. Certaines applications de traitement du signal au comportement très dynamique ne pouvant être modélisé par le modèle de calcul supporté par notre méthode, nous proposons un nouveau méta-modèle de type flux de données répondant à ce besoin. Ce nouveau méta-modèle permet la modélisation d’applications reconfigurables et modulaires tout en préservant la prédictibilité, la concision et la lisibilité des diagrammes de flux de données